SemanticScuttle - klotz.me » klotz: retrieval-augmented generation+large language models

klotz: retrieval-augmented generation* + large language models*

Building a RAG System That Runs Completely Offline

A tutorial on building a private, offline Retrieval Augmented Generation (RAG) system using Ollama for embeddings and language generation, and FAISS for vector storage, ensuring data privacy and control.

1. **Document Loader:** Extracts text from various file formats (PDF, Markdown, HTML) while preserving metadata like source and page numbers for accurate citations.
2. **Text Chunker:** Splits documents into smaller text segments (chunks) to manage token limits and improve retrieval accuracy. It uses overlapping and sentence boundary detection to maintain context.
3. **Embedder:** Converts text chunks into numerical vectors (embeddings) using the `nomic-embed-text` model via Ollama, which runs locally without internet access.
4. **Vector Database:** Stores the embeddings using FAISS (Facebook AI Similarity Search) for fast similarity search. It uses cosine similarity for accurate retrieval and saves the database to disk for quick loading in future sessions.
5. **Large Language Model (LLM):** Generates answers using the `llama3.2` model via Ollama, also running locally. It takes the retrieved context and the user's question to produce a response with citations.
6. **RAG System Orchestrator:** Coordinates the entire workflow, managing the ingestion of documents (loading, chunking, embedding, storing) and the querying process (retrieving relevant chunks, generating answers).

2025-11-15 Tags: rag, self-hosted, llm, ollama, faiss, embeddings, vector database, hackernoon by klotz

Enhancing GPU-Accelerated Vector Search in Faiss with NVIDIA cuVS

This post explores how to solve challenges in vector search using NVIDIA cuVS with the Meta Faiss library. It covers the benefits of integration, performance improvements, benchmarks, and code examples.

2025-11-07 Tags: vector search, faiss, nvidia cuvs, gpu acceleration, ivf, cagra, rag, large language models, machine learning by klotz

Redefining Retrieval Evaluation in the Era of LLMs

This paper addresses the misalignment between traditional IR evaluation metrics and the requirements of modern Retrieval-Augmented Generation (RAG) systems. It proposes a novel annotation schema and the UDCG metric to better evaluate retrieval quality for LLM consumers.

2025-10-29 Tags: retrieval augmented generation, rag, information retrieval, llms, evaluation metrics, udcg, relevance, utility by klotz

The Cursor Moment for DevOps

Plural is bringing AI into the DevOps lifecycle with a new release that leverages a unified GitOps platform as a RAG engine. This provides AI-powered troubleshooting, natural language infrastructure querying, autonomous upgrade assistance, and agentic workflows for infrastructure modification, all with enterprise-grade guardrails.

2025-10-11 Tags: devops, llm, gitops, kubernetes, infrastructure as code, automation, troubleshooting, yaml, terraform, upgrade assistant, rag by klotz

Awesome LLM Apps

A curated collection of Awesome LLM apps built with RAG, AI Agents, Multi-agent Teams, MCP, Voice Agents, and more. This repository features LLM apps that use models from OpenAI, Anthropic, Google, xAI and open-source models like Qwen or Llama.

2025-09-15 Tags: llm, rag, agents, open source, python, machine learning, github by klotz

SemTools: Are Coding Agents all you Need?

The article explores whether combining a command-line agent (like Claude Code or Gemini CLI) with Unix-like file system tools and SemTools is sufficient for complex tasks, particularly document search. It details a benchmark testing the limits of coding agents with and without SemTools, focusing on search, cross-referencing, and temporal analysis. The conclusion is that CLI access is powerful and SemTools enhances agent capabilities for document search and RAG.

2025-09-06 Tags: semtools, coding, agents, llamaindex, rag, document search, cli, semantic search, llm, llamaparse, hallux by klotz

Retrieval-augmented generation with Nvidia NeMo Retriever

Nvidia’s NeMo Retriever models and RAG pipeline make quick work of ingesting PDFs and generating reports based on them. Chalk one up for the plan-reflect-refine architecture.

2025-08-23 Tags: nvidia, nemo retriever, rag, ai, llms by klotz

Sparse Priming Representations (SPR)

Sparse Priming Representations (SPR) is a research project focused on developing and sharing techniques for efficiently representing complex ideas, memories, or concepts using a minimal set of keywords, phrases, or statements, enabling language models or subject matter experts to quickly reconstruct the original idea with minimal context.

2025-08-20 Tags: sparse priming representations, llm, latent space, in-context learning, rag, vector database, knowledge management by klotz

Hitchhiker’s Guide to RAG: From Tiny Files to Tolstoy with OpenAI’s API and LangChain

Scaling a simple RAG pipeline from simple notes to full books. This post elaborates on how to utilize larger files with your RAG pipeline by adding an extra step to the process — chunking.

2025-08-20 Tags: rag, openai, langchain, llm, vector database, faiss, chunking, medium by klotz

Improving LangChain Knowledge Graph Extraction with BAML Fuzzy Parsing

An end-to-end raw text-to-graph pipelines. This blog explores the limitations of LangChain extraction when using smaller quantized models, and how BAML can improve extraction success rates.

2025-08-09 Tags: langchain, knowledge graph, extraction, baml, fuzzy parsing, llm, rag by klotz

SemanticScuttle - klotz.me

klotz: retrieval-augmented generation* + large language models*

Linked Tags

Related Tags